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IN THE SPECIFICATION 



Please amend the specification by replacing the paragraph on page 25 lines 8 through 20 
with the following Table A and paragraph: 



Table A: Codon degeneracy of amino acids 



Amino acid 


One letter 


Three letter 


Codons 


Alanine 


A 


Ala 


GCA GCC GCG GCT 


Cysteine 


C 


Cys 


TGC TGT 


Aspartic acid 


D 


Asp 


GAC GAT 


Glutamic acid 


E 


Glu 


GAAGAG 


Phenylalanine 


F 


Phe 


TTC TTT 


Glycine 


G 


Gly 


GGA GGC GGG GGT 


Histidine 


H 


His 


CAC CAT 


Isoleucine 


I 


He 


ATA ATC ATT 


Lysine 


K 


Lys 


AAAAAG 


Leucine 


L 


Leu 


TTA TTG CTA CTC CTG CTT 


Methionine 


M 


Met 


ATG 


Asparagine 


N 


Asn 


AAC AAT 


Proline 


P 


Pro 


CCA CCC CCG CCT 


Glutamine 


Q 


Gin 


CAA CAG 


Arginine 


R 


Arg 


AGA AGG CGA CGC CGG CGT 


Serine 


S 


Ser 


AGC AGT TCA TCC TCG TCT 
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Threonine 



T 



Thr 



ACA ACC ACG ACT 



Valine 



V 



Val 



GTA GTC GTG GTT 



Tryptophan W 



Trp 



TGG 



Tyrosine 



Y 



Tyr 



TAG TAT 



Certain amino acids may be substituted for other amino acids in a protein sequence 
without appreciable loss of the desired activity. It is thus contemplated that various changes may 
be made in the peptide sequences of the disclosed protein sequences, or their corresponding 
nucleic acid sequences without appreciable loss of the biological activity. 

Please replace the paragraph in the specification at page 39 lines 10 through 17 with the 
following amended paragraph: 

The short nucleic acid sequences may be used as probes and specifically as PGR probes. 
A PGR probe is a nucleic acid molecule capable of initiating a polymerase activity while in a 
double-stranded structure with another nucleic acid. Various methods for determining the 
structure of PGR probes and PGR techniques exist in the art. Gomputer generated searches using 
programs such as PrimerS (www[[.]]-genome.wi.mit.edu/cgi-bin/primer/primer2.cgi), 
STSPipeline (www-genome.wi.mit.edu/cgi-bin/www.STS_Pipeline), or GeneUp (Pesole, et al., 
BioTechniques 25:1 12-123, 1998), for example, can be used to identify potential PGR primers. 

Please replace the paragraph in the specification at page 78 lines 7 through 16 with the 
following amended paragraph: 

PHRED is used to call the bases from the sequence trace files ( http://w ww[[.]3 
mbt.washington.edu). Phred uses fourier methods to examine the four base traces in the region 
surrounding each point in the data set in order to predict a series of evenly spaced predicted 
locations. That is, it determines where the peaks would be centered if there were no 
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compressions, dropouts, or other factors shifting the peaks from their "true" locations. Next, 
PHRED examines each trace to find the centers of the actual, or observed peaks and the areas of 
these peaks relative to their neighbors. The peaks are detected independently along each of the 
four traces so many peaks overlap. A dynamic programming algorithm is used to match the 
observed peaks detected in the second step with the predicted peak locations found in the first 
step. 

Please replace the two paragraphs beginning "Example 3:" in the specification at page 79 
line 10 through page 81 line 5 with the following two amended paragraphs: 

Example 3: Identifying genes within a genomic BAG library 

This example illustrates the identification of combigenes within the rice genomic contig 
library as assembled in Example 2. The genes and partial genes that are embedded in such 
contigs are identified through a series of informatic analyses. The tools to define genes fall into 
two categories: homology-based and predictive-based methods. Homology-based searches (e.g,, 
GAP2, BLASTX supplemented by NAP and TBLASTX) detect conserved sequences during 
comparisons of DNA sequences or hypothetically translated protein sequences to public and/or 
proprietary DNA and protein databases. Existence of an Oryza sativa gene is inferred if 
significant sequence similarity extends over the majority of the target gene. Since homology- 
based methods may overlook genes unique to Oryza sativa, for which homologous nucleic acid 
molecules have not yet been identified in databases, gene prediction programs are also used. 
Predictive methods employed in the definition of the Oryza sativa genes included the use of the 
GenScan gene predictive software program which is available from Stanford University {e,g,, at 
the website: gnomic/stanford. e du/GENSCANW.htm lwww- 

gnomic/stanford.edu/GENSCANW.htmU and the Genemark.hmm for Eukaryotes program from 
Gene Probe, Inc (Atlanta, GA) www-geneprobe.net/index.htm www.g e n e prob e .n e t/ind e x.htm ). 
GenScan, in general terms, infers the presence and extent of a gene through a search for "gene- 
like" grammar. GeneMark.hmm searches a file containing DNA sequence data for genes. It 
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employs a Hidden Markov Model algorithm with a species-specific inhomogeneous Markov 
model of gene-encoding regions of DNA. 

The homology-based methods that are used to define the Oryza sativa gene set included 
GAP2, BLASTX supplemented by NAP and TBLASTX. For a description of BLASTX and 
TBLASTX see Coulson, Trends in Biotechnology 12:76-80 (1994) and Birren et al. Genome 
Analysis, 1 :543-559 (1997). GAP2 and NAP are part of the Analysis and Annotation Tool 
(AAT) for Finding Genes in Genomic Sequences which was developed by Xiaoqiu Huang at 
Michigan Tech University [[ ]]and is available at the web site www- 

genome.cs.mtu.edu/ g e nom e .cs.mtu. e du/ . The AAT package includes two sets of programs, one 
set DPS/NAP (referred to as "NAP") for comparing the query sequence with a protein database, 
and the other set DDS/GAP2 (referred to as "GAP2") for comparing the query sequence with a 
cDNA database. Each set contains a fast database search program and a rigorous alignment 
program. The database search program identifies regions of the query sequence that are similar 
to a database sequence. Then the alignment program constructs an optimal aUgnment for each 
region and the database sequence. The alignment program also reports the coordinates of exons 
in the query sequence. See Huang, et al, Genomics 46: 37-45 (1997). The GAP2 program 
computes an optimal global alignment of a genomic sequence and a cDNA sequence without 
penalizing terminal gaps. A long gap in the cDNA sequence is given a constant penalty. The 
DNA-DNA alignment by GAP2 adjusts penalties to accommodate introns. The GAP2 program 
makes use of splice site consensuses in alignment computation. GAP2 delivers the alignment in 
linear space, so long sequences can be ahgned. See Huang, Computer Applications in the 
Biosciences 10 227-235 (1994). The GAP2 program aligns the Oryza sativa contigs with a 
library of 42,260 Oryza sativa cDNAs. 

Please replace the paragraph at page 81 lines 13 through 17 with the following amended 
paragraph: 
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NAP takes a nucleotide sequence, translates it in three forward reading frames and three 
reverse complement reading frames, and then compares the six translations against a protein 
sequence database {e.g. the non-redundant protein (/.e., nr-aa database maintained by the 
National Center for Biotechnology Information as part of GenBank and available at the web site: 
www-ncbi.nlm.nih.gov www.ncbi.nlm.m hrgev ). 

Please replace the paragraph at page 86 lines 3 through 7 with the following amended 
paragraph: 

Putative promoter sequences are also searched with matrices for the TATA box, GC box 
(factor name: V_GC_01) and CCAAT box (factor name: F_HAP234_0l). The matrix for the 
TATA box is from the Eukaryotic Promoter Database fwww-epd.isb-sib.ch/ http://wMiv,epd>isb 
s ib,ch/) and the matrices for the GC box and the CCAAT box are from Transfac ( www- 
transfac.gbfde/TRANSFAC / http://traD s fac.gbf.dc/TR.\NSFACA ). 

Please replace the paragraph beginning "Example 5" at page 87 line 22 through page 88 
line 10 with the following amended paragraph: 

Example 5: Identifying promoters in the genomic BAC library using an expression assay 

Promoters may also be identified based on quantitative analysis of genes that are cis- 
associated with candidate promoters, (i.e. the native genes). In this method, the native genes 
associated with SEQ ID NO: 1 through SEQ ID NO: 5 7,467 are analyzed on a digital northem 
blot. Digital northem data can be generated from EST sequencing, SAGE and other methods, 
which in effect count RNA molecules expressed in cell. This data can be generated as needed, or 
is generally available to the public on a number of web sites (e.g., www4igr.or g www.tigr.org ). 
Data can be obtained from any plant species, although data on rice gene expression is 
particularly preferred. Promoters are selected based on the expression information of the digital 
northem. For [[]]example, identifying genes expressing genes under stress-related conditions 
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would provide a group of promoters able to confer such stress-inducible expression to other 
genes. 
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Please replace the paragraph at page 94 lines 2 through 8 with the following amended 
paragraph: 

Reference for transcription factors and sequence motifs : Motifs and transcription 
factors are found in one of three databases: PLACE, PlantCARE or TRANS (respectively, 

www-dna.affrc.go.ip/htdocs/PLACE/ http;/A ^ ^iv.dna.affrc.go.jp/htdocs/PLACE/ , www- 
sphinx.rug.ac.be:8080/PlantCARE/index.htm http;//sphinx.rug.ae.ho: « 08n/PlnntrAPF./mHi^v. 
htm, www-transfac.gbf.de/TRANSFAC/. o r orhttp://tran r. fiip.phf.rin/Ttt A MSF i rinr 
Yoshihara et al., FEBS Letters 383, 1996, pp 213-218; or Toyofuku K et al. FEES Lett 
428:275-280 (1998) or lit3 (Huang et al Plant Mol Biol 14:655-668 (1990)), 
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